perm filename 106A40[1,RWF] blob sn#732910 filedate 1984-03-30 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00010 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002				       FILES
C00008 00003	Text Files
C00023 00004	File Cliches
C00026 00005	Another alternate method uses a  sentinel in the file  to mark the end  of
C00029 00006	Example: a program to test that every line of file f contains at least
C00030 00007	Example: Program to  read a text  file, and left-justify  it with as  many
C00033 00008	Pseudo-files
C00037 00009	Terminal Input Cliches
C00039 00010	The Naming of Files
C00042 ENDMK
C⊗;
			       FILES

				by
		         Robert W. Floyd
			 Copyright  1983

A file is  a sequence  of characters  or data.   Because it  is stored  in
magnetic disc memory, it is permanent and there is room for it to be  very
long, but (unlike  an array) it  can only be  created or used  in a  fixed
left-to-right order.  A file may be read or written by a program.  At  any
given moment, a  file can be  in one of  three conditions (called  modes):
closed, open  for  reading,  open  for writing.   There  is  a  family  of
operations, called input operations, which can only be done on a file that
is open for reading.  The input operations bring information from the file
into the program.  There  is another family  of operations, called  output
operations, which can only  be done on  a file that  is open for  writing.
Opening a file is a  bit like checking a book  out of the library;  nobody
else can use the file while you  have it open, and  it must eventually  be
closed again.

A file in Pascal has  two names; an internal name,  by which it is  called
inside the Pascal program, and an external name, by which it is stored  in
the  computer's  directory,  and  used  in  editing  and  other  executive
operations.

Usually, a  Pascal program  allows  the user  to  say what  external  name
corresponds to each internal name; this allows a single Pascal program  to
be applied to  many different  files.  The relation  between internal  and
external names is closely analogous to the relation between parameters and
arguments.

In the  definitions and  examples  which follow,  we  shall use  ININT  as
exemplifying the  internal  name of  a  file  which has  been  opened  for
reading, OUTINT as the external name of  a file which has been opened  for
writing, and INTNAME as a typical internal name of a file in any mode.  We
shall use INEXT, OUTEXT, AND EXTNAME in the same way as the  corresponding
external names.  We shall use I, R, C, and B as typical variables of types
INTEGER, REAL, CHAR, and BOOLEAN.

If a certain file is a sequence of elements e↓1 e↓2 e↓3---e↓n, and is open
for reading, there is  always a mark, akin  to a bookmark, separating  the
part of the file that  has already been read from  the part that has  not.
In our examples, we use the @ symbol to show this mark (the actual mark is
not visible) and to show that the file is open for reading.  When the file
is first opened for reading, it looks like @e↓1 e↓2 e↓3---e↓n.  Eventually
all the elements of the file have  been read; the file looks like e↓1  e↓2
e↓3---e↓n@, and this is  called the end-of-file  condition for that  file.
We call the position of the @ symbol the read pointer.  If a file is  open
for writing, we show  this by including a  similar write pointer, #.   All
writing is done at the right end of  the file, so a file open for  writing
initially looks like just #, and later looks like e↓1 e↓2 e↓3---e↓n#.

There are files of integers, reals, Boolean values (true or false,  bits),
and characters.  There are also files of characters separated into  lines,
called text files.  Because most Pascal  file operations are done on  text
files, we treat them in detail first.
Text Files

A text file  is a  sequence of (zero  or more)  lines, where a  line is  a
sequence of (zero or more) characters followed by a carriage return symbol.
Assume that OUTINT  is the  internal name  of a  text file  that has  been
opened for writing.  The command
	WRITE(OUTINT,X)
can be used, when  X  is an  expression of type   REAL, INTEGER, CHAR,  or
BOOLEAN, to append the value of X, expressed by a string of characters, to
the end  of  the  file OUTINT.   If  the  file initially  is  ABC#,  after
WRITE(OUTINT,13) it becomes ABC__________13#.  After WRITE(OUTINT,13.0) it
becomes ABC_1.3000000E+01#.   After  WRITE(OUTINT,`D') it  becomes  ABCD#.
After WRITE(OUTINT,3>5) it becomes ABCFALSE#.  (For details of the  format
of  numbers  written  on  a  text  file,  see  __________).   The  command
WRITELN(OUTINT) is used to end a line on a text file open for writing;  it
changes ABC# to ABC↓#.  Several WRITEs  of variables to the same file  can
be combined; WRITE (OUTINT,X1,X2,X3) means the same as
	WRITE(OUTINT,X1);
	WRITE(OUTINT,X2);
	WRITE(OUTINT,X3)
and WRITELN(OUTINT,X1,X2) means the same as
	WRITE(OUTINT,X1);
	WRITE(OUTINT,X2);
	WRITELN(OUTINT).

If the text  file to which  writing is  being done has  the internal  name
OUTPUT, the WRITE and  WRITELN commands above can  be abbreviated to  just
WRITE(X1,X2,X3) and WRITELN(X1,X2);  the command WRITELN  (OUTPUT) can  be
abbreviated to WRITELN.   We say  that OUTPUT  is the  /default/ file  for
writing.  Most Pascal programmers use OUTPUT as the internal name for  the
main output text file of a program, in order to use abbreviated commands.

Assume that ININT is the internal name of a text file that has been opened 
for reading.  The command
	READ(ININT,X)

can be used, when X is a variable of type REAL, INTEGER, CHAR, or BOOLEAN,
to take from the file a value of the proper type (expressed in characters)
and  give  it  to  X.   If  the  file  initially  is  ABC@_13_42↓,   after
READ(ININT,I) the file would  be ABC_13@_42↓, and I  would have the  value
13, as if I:=13 had been executed.  (In reading from a text file to a real
or integer variable, initial spaces  or carriage returns are passed  over,
and the reading stops  when the next  character of the  file could not  be
part of the number being read.)   If the file initially contains  ABC@DE↓,
after READ(ININT,C) the file is ABCD@E↓  and C contains `D', as if  C:=`D'
had been executed. If the file initially contains ABC@TRUE_DE↓, after READ
(ININT, B) the file is ABCTRUE@_DE↓, and B is true.  It is an error to try 
to read a file unless a value of the required type is next on the file.

The command READLN(ININT) moves the read mark past the next carriage return.
It changes ABC@DE↓FG↓ to ABCDE↓@FG↓. 

Abbreviated forms for reading several variables are
	READ(ININT,X1,X2,X3), meaning
		READ(ININT,X1);
		READ(ININT,X2);
		READ(ININT,X3) and

	READLN(ININT,X1,X2), meaning
		READ(ININT,X1);
		READ(ININT,X2);
		READLN(ININT).

To test whether the next character of ININT is a carriage return, use  the
test EOLN(ININT), which  is, for example,  true if the  file is  ABC@↓DE↓,
false if it is  ABC↓@DE↓; EOLN stands for  end-of-line.  This test may  be
needed while  reading  characters,  to distinguish  spaces  from  carriage
returns, because the READ command treats  the carriage return as a  space.
None of the  above input operations  is legal if  there is an  end-of-file
condition.  To test for an end-of-file condition, use EOF(ININT), which is
true for ABC↓@ but  false for ABC@↓.   If there is  the possibility of  an
end-of-file condition, a  program should  check by EOF  before trying  any
other input operation.

If the text file from  which reading is being  done has the internal  name
INPUT, the READ, READLN, EOLN, and EOF operations above can be abbreviated
to  READ(X1,X2,X3),  READLN(X1,X2)  or   READLN,  EOLN,  and  EOF.    Most
programmers use INPUT, the default internal name, for the major input text
file of a program, in order to use abbreviated commands.

Non-Text Files

If a file is not of type TEXT, it can hold values only of one single type.
Say INTNAME is a file of integers.   Then integer values I1 and I2 can  be
written on it by WRITE(INTNAME,I1,I2) when  it is open for writing.   When
it is open for reading, values can be read from it to integer variables I1
and I2 by READ(INTNAME, I1,I2).  An end-of-file condition can be tested by
EOF(INTNAME). No other input  or output operations  can be done.   Similar
restrictions apply to files of real, boolean, or character values.   Files
of integer,  real, and  boolean values  are  not in  a form  suitable  for
printing or for reading  on the terminal screen;  they are used  primarily
for communication  between programs,  or between  successive stages  of  a
single program.

Back to Files in General

A file may be opened for reading by the command RESET(INTNAME).  It may be
opened for writing by the command  REWRITE(INTNAME).  It may be closed  by
CLOSE(INTNAME).(?)  At the  end of  a Pascal  program, all  its files  are
automatically closed.  At  the beginning  of a Pascal  program, the  files
INPUT  and  OUTPUT  are  automatically  opened  for  reading  and  writing
respectively.  Other files must be  opened explicitly before any input  or
output operations can be executed.

(?) If a Pascal program is stopped by intervention from the terminal,  the
files may not be automatically closed.  In this case, the TOPS-20  command
CLOSED may be used to close the files.

All files in Pascal except INPUT and OUTPUT must be declared as variables,
of type  either  TEXT or  FILE  OF  t, where  t  is any  type  not  itself
containing files.  (For example, FILE OF REAL and FILE OF ARRAY [1..10] OF
CHAR).  The names INPUT and OUTPUT  are implicitly declared of type  TEXT,
and should not be declared as variables.

Some files are used entirely within a single program; they are written  by
that program, later are  read during the same  execution, and are then  no
longer needed.  Such /internal/ files do not need an external name.  Input
files which  provide data  to the  program  from the  user or  some  other
external source, and output  files for printing  or other subsequent  use,
must have an external name.  Such  /external/ files must be listed in  the
program header  line.   The header  is  of the  form  

		PROGRAM  P(INTNAME1, INTNAME2,INTNAME3); 

where all internal names of external  files are  listed  (including  INPUT  
and  OUTPUT  if  they are  used).    When the  program  is  executed,  the
executed, the user will  be asked for the  external name corresponding  to
each internal name.

It is  possible for the  program to  ``see''  the next  single  element on
a file  without actually moving the read  pointer; if the file's  internal 
name is  ININT, and the file contains     e↓1 e↓2---e↓{i-1}@e↓i---e↓n, the
expression  ININT↑ (or  ININT∧  on  some keyboards) has  the value e↓i; to
discard characters from a text file up to but not including the next space, 
for example,  one could do
	WHILE ININT↑<>`_' DO READ(ININT,C).
The read pointer can be moved one  place to the right without reading,  by
the command GET(ININT,C).  In fact, READ(ININT,C)  is an abbreviation  for
C:=ININT↑; GET(ININT). The above example, then, could also be
	WHILE ININT↑<>' 'DO GET(ININT).

Output files for printing should be  limited to the width of the  printer,
132 characters;  that  is,  your program  should  execute  WRITELN  before
writing more than 132 characters.   At most 60 lines  will fit on a  page;
the command PAGE(OUTINT) can be  used to start at  the beginning of a  new
page, even if the old one is not  full.  (If PAGE is not used, a new  page
will be used whenever a page fills up.)  Similarly, output to be viewed on
the terminal should be at most 80 characters wide and (if you want to  see
it all at once) 24 lines high.

File Cliches

To read and process every item of a non-text file, or every character of a
text file, with initialization before the first and finalization after the
last, the standard pattern in Pascal is:

	BEGIN
	RESET(f);(* not needed initially if f is INPUT*)
	initialize;

	WHILE NOT EOF(f) DO
		BEGIN
		READ(f,x);
		process datum in x
		END;

	(* all data processed, at end of file*)

	finalize
	END

If the file is a text file and  the data are not single characters, it  is
much harder  to write  a correct  program  to process  a file  of  numbers
terminated by the end-of-file. After  each number, the program must  check
each  character position in turn for a  null  (space  or  carriage return)
or end of file. It must not try to test a character in the buffer, however,
when the end-of-file has  been reached.  The following program,  in  every
in every position of the file,  first tests for  end-of-file,  leaving the 
iteration if present; then tests for a  null, discarding it if present. If
neither is  true, it is correct to read  and process a datum.

	BEGIN
	initialize;
	(*RESET(f) if needed*)
	WHILE NOT EOF(f) DO
		BEGIN
		IF (f↑ is null) THEN GET(f)
		ELSE
			BEGIN
			READ(f,x)
			process x
			END;
		END
	(*RESET(f) if needed*)
	finalize
	END

where  GET(f)  moves forward one character position in file  f,  changing
AB@CD to ABC@D. If EOF(f) is true, GET(f) is an error.
*********GET already defined
Another alternate method uses a  sentinel in the file  to mark the end  of
the data. A sentinel is a special value, like 999999, of the same type  as
the data, and so can be read by the same READ that reads the data.

	BEGIN
	(*RESET(f) if needed*)
	initialize;
	READ(f,x);
	WHILE(x is not a sentinel) DO
		BEGIN
		process x;
		READ(f,x)
		END;
	(*RESET(f) if needed*)
	finalize
	END

The above program is more efficient, needing only one test per  iteration,
but its structure is rather peculiar; each iteration processes the  number
read on the previous iteration.

To process one line of  characters on a text  file, where every line  must
end with ↓, and  you know a  line of data  is present, so  no EOF test  is
needed:  

	BEGIN 
	initialize;
	WHILE NOT EOLN(f) DO
		BEGIN
		READ(f,c);
		process c
		END;
	(* next character is end-of-line*)
	READ(f,c); (* or GET(f) or READLN(f) *)
	finalize
	END

To  process every  line of  characters  in a text  file,  with  process  A
initializing the whole computation, process B initializing the  processing
of one line, process D finalizing the processing of one line, and  process
E finalizing the whole computation:

	BEGIN
	A;
        (* reset(f) if needed *)
	WHILE NOT EOF(f) DO
		(* process one line*)
		BEGIN
		B;
		WHILE NOT EOLN(f) DO
			BEGIN
			READ(f,c);
			process c
			END;
		READ(f,c);(* discard end-of-line symbol*)
		D
		END;
	(* end of file*)
	E
	END
Example: a program to test that every line of file f contains at least
one asterisk:

	BEGIN
	(*RESET(f) if needed*)
	EVERYLINE:= TRUE;
	WHILE NOT EOF(f) DO
		BEGIN
		THISLINE:= FALSE;
		WHILE NOT EOLN(f) DO
			BEGIN
			READ(f,c);
			IF c='*' THEN
				THISLINE:= TRUE
			END;
		READ(f,c);
		IF NOT THISLINE THEN
			EVERYLINE:=FALSE
		END;
	IF EVERYLINE THEN WRITE('EVERY LINE HAS A STAR')
	ELSE WRITE ('NOT EVERY LINE HAS A STAR')
	(*RESET(f) if needed*)
	END
Example: Program to  read a text  file, and left-justify  it with as  many
words as possible on a line. The original line breaks are ignored.

BEGIN PROGRAM(---)
Declarations;
PROCEDURE PROCESS (X: CHAR);
BEGIN
IF X <> ' ' THEN
	 BEGIN
	 LETCOUNT:= LETCOUNT + 1; (*WORD LENGTH*)
	 WORD [LETCOUNT]:= X
	 END
ELSE
	 BEGIN
	 IF LINELENGTH + LETCOUNT + 1 > LIMIT THEN
		BEGIN	
		WRITELN;
		LINELENGTH:= 0
		END;
	IF LINELENGTH > 0 THEN
		BEGIN
		WRITE (' ');
		LINELENGTH:= LINELENGTH + 1 + LETCOUNT
		END
	ELSE LINELENGTH:= LETCOUNT;
	FOR I:= 1 TO LETCOUNT DO
		WRITE(WORD [I])
	END
END;	

BEGIN (* MAIN PROGRAM *)

LETCOUNT:= 0;
LINELENGTH:= 0;
LASTC:= ' ';

WHILE NOT EOF DO
	BEGIN
	READ(C);
	IF (LASTC <> ' ' ) OR (C<> ' ' ) THEN
		PROCESS(C);
	LASTC:= C
	END;
(*RESET if needed*)
END

If running time is important is  important, the call on PROCESS(C) can  be
replaced by a slightly  modified copy of the  procedure body. The  program
uses the fact that READ treats the end-of-line symbol as a space.

Pseudo-files

Pascal treats several entities  more or less as  if they were files.   The
external name TTY:  (note the colon) can be used to let the  corresponding
internal name represent data typed at  the terminal keyboard when in  read
mode, and represent  the terminal  screen when  in write  mode.  In  write
mode, this is sometimes useful to see the results of a program immediately
as it writes them, especially if the lines of output are no more than  the
80-character width of the screen.  In read mode, the use of external  name
TTY:  can not  be recommended; a  program designed for  input from a  true
file is usually not well designed for keyboard input.  See  ______________
for details.

The internal name TTY can also be used to represent the terminal  keyboard
and screen.  Output commands to  TTY, such as WRITE(TTY,X1,X2), are  quite
analogous to output commands for files.   Input commands, however, differ.

During the execution of a Pascal program, the sequence of characters typed
at the keyboard is used as if it were a text file called TTY for all input
operations.  A line typed in is made available to the program only when it
is completed with a carriage return; up until entry of the carriage return
the user may modify  the line at will,  for example by backspacing.   When
the program has consumed all available keyboard input, it stops and  waits
until the user  has typed  another complete  line.  To  avoid impasses,  a
program should request  more input  before consuming  all available  input
characters (in particular,  the final carriage  return).  The  pseudo-file
with internal name TTY  is initialized to  an empty line,  as if a  single
carriage return  had  already been  typed  in,  so that  the  program  can
continue execution  (Older versions  of the  translator required  keyboard
input before  the  program  would  start).  The  pseudo-file  TTY  is  not
mentioned in the program header, nor is it declared.  It is treated as  of
type TEXT.  For input pseudo-files  TTY or TTY:, an end-of-file  condition
is created  only  by  typing  CONTROL/Z;  usually  programs  intended  for
terminal input do not use the end-of-file condition.

The external name LPT:( note  the colon) can be  used as a pseudo-file, in
write mode, for information which is automatically printed upon completion
of execution, and is not retained as a permanent file.
Terminal Input Cliches

Reading data as lines of characters from the terminal is typically done by
the program below.

(*No declaration needed for TTY*)
BEGIN
INITIALIZE;
WRITE(TTY, prompting message); (*Explain to  the user what he must  type*)
READLN(TTY); (*  discard remnants  of previous  line; program  waits  here
until user completes a new line*)
WHILE NOT EOLN(TTY) DO
	BEGIN
	READ(TTY,C);
	Process C
	END;
Process carriage return, if required, without reading it.
END

Alternatively, a line  can be  read into a  /string variable/  S, of  type
ARRAY[1..80] OF CHAR; the program below  tests each line for validity  and
rereads until an acceptable line has been read.
BEGIN
WRITE(TTY, prompting message);
BADDATA:=TRUE;
WHILE BADDATA DO
	BEGIN
	READLN(TTY); (*Discard remnants of earlier line*)
	READ(TTY,S); (*Puts in S the entire line except carriage return*)
	IF(S is acceptable) THEN
		BADDATA:=FALSE
	ELSE
		WRITE(TTY, reprompting message)(*Explain exact requirements 
		for correct input*)
	END
END
The Naming of Files

Like the naming of cats [1], the naming of files should not be  undertaken
lightly.  See the section of the LOTS Overview on file descriptors [ ] for
the rules  about directory  names.  The  extension field  of your  program
should normally be PGO, for translation by the PASSGO translator.  If your
program uses a separate library of subprograms, use the PASCAL  translator
by making PAS the extension field.   The file name proper of your  program
for a  course  assignment  should  begin with  an  identification  of  the
assignment number.   A program  for assignment  8, to  do an  integration,
might be in file P8INT.PGO.  The data  file for a program should have  the
same name except  for extension  field DAT (e.g.,  P8INT.DAT); the  output
file should  have the  same name  except for  extension field  OUT  (e.g.,
P8INT.OUT).  An interactive program should keep a permanent record of  all
input, perhaps  on a  file with  extension field  LOG; this  allows  later
confirmation that data were entered correctly.

It is a common  disastrous error to  give the name of  the program as  the
external name of the output file; this results in deletion of the  program
file.  When this happens,  (1)  DO NOT LOGOUT,  (2) DO NOT EXPUNGE,  until
the deleted file has been recovered.   See Floyd's notes, Appendix C,  for
methods to locate and restore deleted files.  As a safety measure, you can
set the normal number of file generations retained in your directory to 2.
(E.g.,  if your program is in file   P8INT.PGO.10,  and you send output to 
P8INT.PGO, it will  go to   P8INT.PGO.11,  and generations  10 and 11 will 
both be retained.)  If you  do so, you  should  delete all  obsolete files  
at  the end  of  each terminal session.  


FILES[1, rfn]